当环境稀疏和非马克维亚奖励时,使用标量奖励信号的训练加强学习(RL)代理通常是不可行的。此外,在训练之前对这些奖励功能进行手工制作很容易指定,尤其是当环境的动态仅部分知道时。本文提出了一条新型的管道,用于学习非马克维亚任务规格,作为简洁的有限状态“任务自动机”,从未知环境中的代理体验情节中。我们利用两种关键算法的见解。首先,我们通过将其视为部分可观察到的MDP并为隐藏的Markov模型使用现成的算法,从而学习了由规范的自动机和环境MDP组成的产品MDP,该模型是由规范的自动机和环境MDP组成的。其次,我们提出了一种从学习的产品MDP中提取任务自动机(假定为确定性有限自动机)的新方法。我们学到的任务自动机可以使任务分解为其组成子任务,从而提高了RL代理以后可以合成最佳策略的速率。它还提供了高级环境和任务功能的可解释编码,因此人可以轻松地验证代理商是否在没有错误的情况下学习了连贯的任务。此外,我们采取步骤确保学识渊博的自动机是环境不可静止的,使其非常适合用于转移学习。最后,我们提供实验结果,以说明我们在不同环境和任务中的算法的性能及其合并先前的领域知识以促进更有效学习的能力。
translated by 谷歌翻译
Fake review identification is an important topic and has gained the interest of experts all around the world. Identifying fake reviews is challenging for researchers, and there are several primary challenges to fake review detection. We propose developing an initial research paper for investigating fake reviews by using sentiment analysis. Ten research papers are identified that show fake reviews, and they discuss currently available solutions for predicting or detecting fake reviews. They also show the distribution of fake and truthful reviews through the analysis of sentiment. We summarize and compare previous studies related to fake reviews. We highlight the most significant challenges in the sentiment evaluation process and demonstrate that there is a significant impact on sentiment scores used to identify fake feedback.
translated by 谷歌翻译
This paper presents a multi-agent Deep Reinforcement Learning (DRL) framework for autonomous control and integration of renewable energy resources into smart power grid systems. In particular, the proposed framework jointly considers demand response (DR) and distributed energy management (DEM) for residential end-users. DR has a widely recognized potential for improving power grid stability and reliability, while at the same time reducing end-users energy bills. However, the conventional DR techniques come with several shortcomings, such as the inability to handle operational uncertainties while incurring end-user disutility, which prevents widespread adoption in real-world applications. The proposed framework addresses these shortcomings by implementing DR and DEM based on real-time pricing strategy that is achieved using deep reinforcement learning. Furthermore, this framework enables the power grid service provider to leverage distributed energy resources (i.e., PV rooftop panels and battery storage) as dispatchable assets to support the smart grid during peak hours, thus achieving management of distributed energy resources. Simulation results based on the Deep Q-Network (DQN) demonstrate significant improvements of the 24-hour accumulative profit for both prosumers and the power grid service provider, as well as major reductions in the utilization of the power grid reserve generators.
translated by 谷歌翻译
电力公用事业公司依靠短期需求预测,以期待重大变化的预期调整生产和分配。该系统审查分析了2000年至2019年之间的学术期刊上发布的240份作品,专注于将人工智能(AI),统计和混合模型应用于短期负荷预测(STLF)。这项工作代表了迄今为止对该主题的最全面的审查。进行了对文献的完整分析,以确定最流行和最准确的技术以及现有的空隙。研究结果表明,尽管人工神经网络(ANN)继续成为最常用的独立技术,但研究人员已经超出了不同技术的混合组合,以利用各种方法的组合优势。审查表明,这些混合组合通常可以实现超过99%的预测精度。短期预测最成功的持续时间已被识别为每小时间隔的一天的预测。审查已确定访问培训模型所需的数据集的不足。在亚洲,欧洲,北美和澳大利亚以外的研究区域中已经确定了一个显着差距。
translated by 谷歌翻译